The timely diagnosis of acute leukemias (AL) can be a challenge under constrained conditions. Patients in particular in low- and mid-income countries, suffer from various access barriers to specialized diagnosis. Delays in diagnosis and referral, especially for patients with acute promyelocytic leukemia (APL), increase early mortality (Rego Blood 2013, Odetola and Tallman ASH Educ Program 2023). Most recently, routine laboratory features have been leveraged to develop and test machine learning (ML) classification algorithms for predicting AL types on multicenter French cohorts (Alcazer, Lancet Digital Health, 2024). Yet, its global generalizability has not been extensively tested.
Methods:
To test these algorithms, we assembled a multicenter retrospective cohort of patients with diagnosed AL from 9 countries, whose laboratory features (total leukocytes, monocyte and lymphocyte counts, platelets, MCV, MCHC, LDH, fibrinogen, prothrombin activity in %, age) were obtained at the earliest timepoint of leukemia diagnosis at hospital contact. The cohort was inclusive of ethnic, social, and age diversity (range 0.08 - 97 years), included both sexes (female 42.7%), adult (≥ 18 years, n=1025) and pediatric patients (n=1771). The top-performing model in the development cohort, an extreme gradient boosting (XGB) model, was employed for testing. A Python package was developed that provides data preparation through HL7/FHIR or csv tables, predictions using an embedded R script, and evaluation using Weights & Biases. The model was run separately for each site to account for cohort heterogeneity. Missing features cutoff was 20%. Feature importance was analyzed by determining SHapley Additive exPlanations(SHAP) values. Misclassified patients were further analyzed regarding their features' clinical significance and by statistical, machine-learning and dimensionality reduction methods. This study was approved by the ethics committee of the University of Duisburg-Essen (N°24-11882-BO)
Results:
In 2796 patients with diagnosed AL, the previously published “confident” predictions of the algorithm reached peak median AUROC of up to 99.7 for APL, 98.8 for acute myeloid leukemia (AML) and 98.8 for acute lymphoblastic leukemia (ALL). High scorings with “confident” predictions were obtained from Europe (e.g. F1 score AML 0.97 [95%CI, 0.972-0.973]), Asia (e.g. ALL F1 score 0.94 [95%CI, 0.937-0.943]) and Latin America (e.g. AML F1 0.98 [95%CI, 0.976-0.978]). “Confident” predictions, however, were only available for 41-5% of patients depending on cohorts. The accuracy “base” prediction of AL varied across sites and countries. ML predicted APL at median AUROC between 0.98 and 0.79 and other types of AML with median AUROC between 0.87 and 0.60. The best “base” algorithm performance was recorded for AML and APL with the data from Salamanca, indicating some feature dependencies of the algorithm.
In the pediatric subsets, ALL was the most frequently diagnosed leukemia, and cohorts reached a median AUROC of 0.78 (range 0.65-0.78), similar to adult ALL. However, the algorithm - originally developed on adult cohorts - did not generalize well for pediatric AML, its F1 scores (range 0.40-0.32) were lower than in pediatric ALL (range 0.72-0.68). We examined potential algorithm limitations, e.g., misclassified patients, to identify sources of bias. Higher proportions of missing values reduced the precision of the predictions, reason why we refined its cutoff. The most important features in SHAP analysis were prothrombin activity and monocyte count across predictions, for ALL also LDH, for AML MCV and age and for APL predictions fibrinogen and MCHC. Misclassified AML patients were predicted as ALL when having low monocyte counts or missing this feature. Few AML patients with impaired coagulation (e.g. PT <60) and normal leukocytes were misclassified as APL. Misclassified ALL patients with high monocyte counts, with higher MCV, and with lower LDH, were predicted as AML. We adjusted the scripts for limitations and statistical outliers to improve the algorithm's applicability in clinical practice.
Conclusion:
Inclusive ML tools can reduce access barriers in hematology. This first international validation of an ML tool to support the diagnosis of AL provides important insight into its validity and practical use. Validating the model on more patients and countries will further inform its generalizability.
Turki:Onkowissen.tv: Speakers Bureau; Maat Pharma: Consultancy; Biomarin: Speakers Bureau; Neovii: Other: Travel reimbursements; Novartis: Other: Travel reimbursements; Janssen: Other: Travel reimbursements; CSL Behring: Consultancy; Pfizer: Consultancy. Reinhardt:Medac, BMS, Immedica: Research Funding. Voso:Novartis: Other: Research support, Speakers Bureau; Celgene/BMS: Other: Research support, Advisory Board, Speakers Bureau; Syros: Other: Advisory Board; Astra Zeneca: Speakers Bureau; Abbvie: Speakers Bureau; Jazz: Other: Advisory Board, Speakers Bureau; Astellas: Speakers Bureau. Nensa:Siemens Healthineers: Research Funding.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal